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ABSTRACT 

While Datalog is a golden standard for denotational query 
answering, it does not support value invention or equality 
constraints. The Datalog^ framework introduced by Gott- 
lob faces these issues by considering rules with fresh vari- 
ables in the head (known as tgds) or equalities in the head 
(known as egds). Several tractable classes have been iden- 
tified, among which: (S) the class of sticky tgds; (T) the 
class of tgds and egds ensuring oblivious termination; and 
(G) and the class of guarded tgds. In turn, the tractability 
of these classes typically relies on the 'chase': (S) ensures 
that every chase derivation is 'sticky'; (T) ensures polyno- 
mial chase termination; and (G) allows stopping the chase 
after a fixed 'depth' while preserving completeness. 

This paper shows that there are alternative algorithms 
(instead of the chase) that can serve as a basis for the de- 
sign of (larger) tractable classes. As a first contribution, we 
present an algorithm for resolution which is complete for 
any set of tgds and egds (rather than being complete only 
for specific subclasses). We then show that a technique of 
saturation can be used to achieve completeness with respect 
to First-Order (FO) query rewriting. As an application, we 
generalize a few existing classes (including (S)) that ensure 
the existence of a finite FO-rewriting. 

We then consider a more general notion of rewriting, called 
Datalog rewriting, and show that it provides a truly unify- 
ing paradigm of tractability for the family of Datalog* lan- 
guages. While the classes (S), (T) and (G) are incompara- 
ble, we show that every set of rules in (S), (T) or (G) can be 
rewritten into an equivalent set of standard Datalog rules. 
On the negative side, this means intuitively that Datalog* 
does not extend the expressive power of Datalog in the con- 
text of query answering. On the positive side however, one 
may use the flexible syntax of Datalog* while using (only) 
standard Datalog in the background, thus making use of 
existing optimization techniques, such as Magic-Set. 

1. INTRODUCTION 

While the language Datalog and its extensions have been 
studied for decades (see e.g. [121 [U [T]), they recently re- 
ceived a renewed attention. In particular Gottlob et al in- 
troduced a comprehensive and unifying framework called 
Datalog* ([101 151 151111]) which is based on a family of Data- 
log extensions. One of the main qualities of this framework 
lies in its generality. In particular, Datalog* is expressive 



enough to cover some interesting classes of ontologies, some 
light-weight description logics and some fragments of F-logic 
(see e.g. [IS]). It was also argued in [TO] that Datalog is 
useful in a variety of contexts, for example Data Exchange 
[15] and the Semantic Web [7|- The main reason for this, is 
that Datalog*, unlike standard Datalog, addresses the fun- 
damental problem of value invention by considering rules 
with fresh variables in the head. Such rules are known 
as tuple-generating dependencies (tgds in short) and corre- 
spond to first-order formulas of the form 

Vx,y, <t>(x,y) -¥ 3z,ip(x,z) 

where <f> and ip are two conjunctions of atoms (that may 
express joins) while z is a tuple of existentially quantified 
variables that can be used to reason about unknown en- 
tities. In addition, Datalog* supports equality-generating 
dependencies (egds in short) of the form 

Wx,y,M, (j)(x,y,z) -> x = y 

where is a conjunction of atoms. Tgds and egds can be 
used to encode typical schema dependencies such as inclu- 
sion dependencies or function dependencies (see e.g. [I]) 
which, in turn, allow to reason about structured data. On 
the negative side, however, the problem of query answering 
under arbitrary tgds and egds is undecidable [6] , and it was 
observed in [8j that the problem remains undecidable even 
for a fixed set tgds. 

To avoid undecidability, the Datalog* framework typically 
considers (in [10]) three alternative restrictions called termi- 
nation, guardedness and stickiness, defined as follows: 

Termination: There exists a chase procedure [151 1141 [8j 
120] that, for a given database, computes a universal solution 
in polynomial time (data complexity) which, in turn, can be 
used for sound and complete query answering. 

Guardedness: The set of dependencies consists of guarded 
tgds and separable egds, in which CELS6, elS shown in [8], it is 
possible to reach tractability, while remaining complete, by 
stopping the oblivious chase after a fixed depth. 

Stickiness: The set of dependencies is a set of tgds (and 
separable egds) ensuring that every chase derivation is sticky 
[11] , meaning intuitively that the fresh variables introduced 
by the chase are only propagated in an harmless way. 



The three classes discussed above are unfortunately in- 
comparable, and the only property that really unifies them 
in [TO] is the fact that they are all based on the chase, either 
in their definition or in terms of properties. This approach 
has several advantages and it contributes, in particular, to 
the simplification of the 'big picture'. However, there are 
alternative ways of approaching the above classes. For ex- 
ample, the guarded fragment can be decided, at least in 
some cases, using tableaux algorithms or resolution algo- 
rithms (see e.g. [241 1191 1171 I16]L Similarly, the criterion 
of stickiness can be understood as a criterion ensuring the 
termination of resolution (as opposed to a specific property 
of the chase). In fact, a custom resolution procedure was 
proposed in [11] for the case of stickiness. However, this 
procedure relies on the specific properties of sticky tgds, and 
it is relevant only for this class of dependencies. In contrast 
with [TT], a contribution of this paper will be to consider a 
resolution algorithm which is defined (and complete) for ar- 
bitrary sets of tgds and egds, as to identify larger tractable 
classes. 

While exploring some alternatives to the chase procedure, 
this paper follows a similar methodology to that of Datalog* 
in the sense that it aims at identifying a unifying paradigm. 
Towards this goal, we will consider, beside resolution, several 
notions of rewriting: 

• Data Rewriting (a.k.a. Universal Solutions). 
Given a database T> and a set of dependencies E, a data 
rewriting for T> is a new database 7J>£ which integrates 
all the information can be inferred from E. Given such 
a rewriting T>s, we can then test whether a conjunctive 
query Q is implied by T> and E by testing whether Q is 
implied by T>s- This technique can be used whenever 
the chase terminates since a universal solution is in 
fact a special case of data rewriting. 

• Query Rewriting (a.k.a. First-Order Rewriting). 
Given a query Q and a set of dependencies E, a query 
rewriting for (E, Q) is a query Qs (in a first-order 
language) which integrates all the information encoded 
in E. Given such a rewriting Qs, we can test whether 
Q is implied by E and a database T> by testing whether 
Qe is implied by T>. As already observed in [11], the 
technique of query rewriting can be used whenever E 
is a set of sticky tgds. 

• Datalog Rewriting (The Unifying Paradigm). 
Given a conjunctive query Q and a set of dependencies 
E, a Datalog rewriting for (E, Q) consists of a new pair 
(E', Q!) where E' is a set of standard Datalog rules 
(without fresh variables in the head) and Q' is a query 
such that (E', Q') is equivalent to (E, Q) with respect 
to query answering. 

As we will show in this paper, the technique of Data- 
log rewriting is not only a strictly generalisation of (first- 
order) query rewriting, but it can also be used in the case 
of terminating dependencies (instead of data rewriting) and 
in the case of guarded dependencies (instead of relying on 
the chase). In this sense, Datalog rewriting is therefore 
a truly unifying paradigm for the family of Datalog* lan- 
guages. Also, Datalog rewriting was proved useful for classes 
of dependencies that are not (yet) covered by the Datalog* 



framework, in particular in the context of Description Logic 
(see e.g. [25]). In this sense, heuristics for Datalog rewriting 
can be used to further generalize the Datalog* framework. 

Structure and Main Contributions 

The preliminary section formalizes the problem of query an- 
swering (under constraints) as a basic implication problem. 
For the sake of concision and symmetry, the key notions 
of databases, queries and dependencies are all based of sets 
of atoms with variables only (called instances). Constants, 
null values, and non-boolean queries with equality atoms are 
discussed in Section 3.5. 

(1) In Section 3.1, we propose a concise definition of resolu- 
tion for arbitrary tgds and egds and show that it is complete 
for query answering. In contrast with alternative approaches 
(such as [23]), this definition does not rely on skolemization 
and unskolemization. Instead, it relies simply on instances 
and renamings (a.k.a. homomorphisms). 

(2) In Section 3.2, we show that a technique of saturation 
can be used to reach completeness with respect to finite 
rewritability. More precisely, the saturated chase (like the 
core chase of [14]) computes a finite data rewriting when- 
ever there exists one. Similarly, the saturated resolution 
computes a finite query rewriting whenever there is one. 

(3) In Section 3.3, we focus on tgds and revisit the notion 
of stickiness. We observe that there is a reduction from the 
class of sticky tgds to the class of lossless tgds which, in fact, 
can be used to identify larger classes or rewritable queries. 

(4) In Section 3.4, we show that query rewriting under egds 
(as opposed to tgds only) is more complex. We then propose 
a heuristic for the integration of a set of egds into a set of 
tgds, thus allowing (in some cases) to rely on stickiness and 
query rewriting despite the presence of conflicting egds. 

(5) In Section 4.1, we show that data rewriting captures the 
class of dependencies ensuring the termination of the obliv- 
ious chase (from [20]), and as a special case, the class of 
weakly acyclic dependencies (from |15|). 

(6) In Section 4.2, we finally show that data rewriting also 
captures the class of guarded tgds. This result is the most 
technical and arguably the most important contribution. 

The proofs (or more accurately, the proof sketches) have 
been included to the body of the paper. Readers unfamiliar 
with the problem of query answering under tgds and egds are 
invited to consult e.g. [10] for more examples, applications 
and related works. Additional proof details (for some of the 
proofs) can be found in [21| . 

2. PRELIMINARIES 

We chose the infinite model semantic. Recall however that 
it coincides with the finite model semantic under either ter- 
mination ( [141 120] 1 ). guardedness ([S]), or stickiness ([lip. 

Instances and Dependencies. Let V be a countable set 
of variables and let a be a finite set of predicates. We assume 
that each R £ a comes with a fixed and finite arity or. An 
instance 7 is a set of atoms R(v) where R G a and v is a tuple 
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of variables respecting the arity of R, that is, v C V aR . We 
let Vi be the set of variables occurring in an instance I. A 
tgd is a rule 73 —¥ 77 where 73 and H are two finite instances 
called the body and the head of r, respectively. We say that a 
tgd r is of the form B(X, Y) -J- H(X, Z) when X = V B nV„, 
Y = Vb\X, and Z = V H \X. Such a tgd r is called a Z3at- 
alog rule when Z = 0. An egd r is a rule B —¥ x = y where 
B is a finite instance and x,y £ Vs. An egd r is of the form 
B(x, y,Z)—>x = y when Z = Vb\{x, y}. 

Semantics. A renaming is a mapping from V to V. Given 
an instance 7 and a renaming 8, we let I [8] be the set of 
atoms R(8(ti), . . . , 8(t n )) where R(ti, . . . , t n ) E I. Given 
two instances 7 and J we let 7 |= J when J [8] CI for some 
renaming 8. Given an instance 7 and a set J of instances, 
we let J |= J when I \= J for some J E J ' . Given two sets X 
and J of instances, we let X |= J when I \= J for all 7 E Z. 
Note that these definitions are consistent with respect to 
singletons. In particular, we have I [= J iff {7} |= {,/}. 
Given two (sets of) instances X and J we write X = J when 
both X\= J and J \= X. 

Given X C V, a renaming of A" is a renaming 8 such that 
8(v) = v for all u E (V \ A). In the following, we use the 
notation 8x to indicate that 8x is a renaming of A. Given 
y C V disjoint from A, and given a renaming we denote 
by [#x,#:k] the renaming of A U Y that coincides with 8x 
on A and with #y on Y. 

Given an instance 7 and a tgd r : 73(A, Y) — > H(X, Z), we 
let I \= r when for all 8x and 8y such that B[8x,8y] C 7, 
there exists #z such that H[8x,8z] C J. Given an instance 
7 and an egd r : 73(:z;, y,Z)^x — y, we write 7 |= r when 
for all 8 X , 8 y and 8z such that 73 [6^, 8z] Q 7, it holds that 
8 x (x) — 8 y (y). Given a set E of tgds and egds, we finally 
write 7 |= E when I \= r for all tgds and egds r £ E. 

Note that, for two instances 7 and J, the property 7 |= 
J corresponds to several equivalent intuitions: the boolean 
conjunctive query J is true in the database I; the query 7 
is contained in the query J; the formula J is implied by 7; 
the instance 7 is a model of J; or there is an homomorphism 
from J to I. A similar comment holds for sets of instances. 

In the following definition, T) and Q denote two sets of in- 
stances and E denotes a set of dependencies. Intuitively: T> 
corresponds to a database or a set of databases (representing 
a set of possible models M), E corresponds to an ontology 
or a set of structural dependencies; and Q corresponds to a 
union of boolean conjunctive queries. 

Definition 1. We say that Q is certain in T> under E, 
denoted 7JAE |= Q, iff, for all instance M such that M \= T> 
and M |= E, we have M (= Q. 

Chase. We next recall the definition of the chase [1511 141 [HI 
120] . More precisely, the following definition coincides with 
[S] in the case of tgds and with [TS] in the case of egds. 

Given an instance 7 and a set of variables Z, we say that 
8z is an 7-fresh renaming when 8z is a renaming of Z, 8z 
is injective on Z and its image 8z(Z) is disjoint from Vj. 
Given two variables u and v we denote by [u«— v] the unique 
renaming 8 of {v} satisfying 8(v) = u. Given an instance 7, 
and a set E of tgds and egds, we write 7 c ^^t. J when either 
J = 7 or one of the following rules applies: 



(tgd) There is a tgd r : B(X, Y) -> 77 (A, Z) in E and some 
renamings 8x and 8y such that B[8x,8y] C 7 and 
J — I D H[8x,8z] for some 7-fresh renaming 8z- 

(egd) There is an egd r : B(x,y, Z) — > x — y in E and some 
renamings 8 X , 8 y , and 8z such that B[8 X , 8 y , 9z] C 7 
and J = I[9 y (y) <— 8 x (x)]. 

Using the symbol * for the transitive closure, we finally write 
J E Chase(7, E) when 7 ase )^ J. 

3. FIRST-ORDER RESOLUTION 
3.1 Definition and Completeness 

We next propose a definition of resolution for tgds and 
egds which, unlike alternative approaches (such as 23 ), does 
not require any complex algorithm of (un)skolemization. 

Given an instance Q and a set E of tgds and egds, we 
write Q '-^^h-s R when one of the following rules applies: 

(ren) R = Q[8] for some renaming 8. 

(tgd) There is a tgd B(X, Y) -> 77 (A, Z) in E and three 
renamings 8x, 8y, 8z such that 

R={Q\H[8x,8z])UB[8x,8 Y } 

and the following conditions hold: 

- there is at least one atom in Q CI 77 [Ox, 8y\\ 

- 8z is an injection from Z to (V \ V(R)). 

(egd) There is an egd B(x,y,Z) — > x — y in E and three 
substitutions 8 X , 8 y , 8z, such that 8 x (x) occurs several 
times in 73, 8 x (x) 7^ 8 y (y), and 7? is obtained by: 

- first renaming some occurrences of 8 x (x) into 8 y (y); 

- and then adding the atoms of B[8 X , 8 y ,8z\- 

We finally write 7? E Resol(Q, E) when Q 7?. 

Theorem 1. For all set E of tgds and egds, and for all 
instances D and Q, the following statements are equivalent: 

(1) D AS |= Q 

(2) 3(7 E Chase(7,E), U |= Q 

(3) 3R E Resol(Q,E), D \= R 

Proof Sketch. The equivalence between (1) and (2) is 
well-known. Recall however that it holds only under the in- 
finite model semantics (see e.g. [2] for more details). The 
implication (3) => (1) means that the resolution is sound, 
and it is fairly straightforward. We next prove that (2) 
implies (3). We proceed by induction, and show that the 
following property holds for all n > 0: 

If there is a chase derivation of length n of the form 

D = 7 c -^e 7i C -^> E • • • In where I n \= Q 

then there exists 7? E Resol(Q, E) such that D \= R. 

In the case n = 0, the property holds with R = Q. Assume 
now the property true for n — 1 and consider a derivation of 
length n as above. Let hbea renaming such that Q[h] C 7 n 
and let Q' = Q[h\. Because of the resolution rule (ren), we 
have Q '-^ts Q 1 . We now distinguish two cases: 
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(1) Assume that I n is obtained by chasing a tgd 

B(X, Y) ->■ H{X,Z) 

and consider 9x,9y,9z such that B[9x, 9y\ C 7 n _i and 7 n = 
In-i U H[0x,6z\- If Q' ^ 7„_i, we can apply the induction 
hypothesis for n — 1. Otherwise, there is at least one atom 
in Q' n H[6 X ,0 Z ] and we have Q' =^> E 7? for 

R={Q'\H[6x,6z])VJB[6 x ,e' Y ]. 

Since 7? C 7„_i we have 7 n -i |= 7? and we can easily con- 
clude the inductive step. 

(2) Assume that I n is obtained by chasing an egd 

B(x,y,Z) -> x = y 

and consider (9 x ,9 y ,9z) such that B[9 x ,9 y ,9z] C 7„_i and 
7„ = 7„_i[0'] for 0' = [9 y (y)<-9 x (x)]. For each atom A £ Q, 
choose an atom c(A) £ I n -i such that #'(c(A)) = A and let 
7? = {c(A),A e Q'} U B[6 x ,6y,6 z ]. We can finally check 
that Q' 7? and conclude the inductive step. □ 

3.2 Rewriting and Saturation 

We next formalize the notions of data rewriting and query 
rewriting before showing that completeness (with respect to 
finite rewritability) can be reached, in both cases, by satu- 
rating the relation or in a rather natural way. 

Definition 2. Consider a set E of dependencies and two 
sets T> and Q of instances. A data rewriting for (T>, E) is a 
set of instances hi such that, for all set Q' of instances: 

(pas (= q!) iff {u |= o!). 

Similarly, a query rewriting for (E, Q) is a set of instances 
1Z such that, for all set V of instances: 

(27' AS N Q) iff 

Saturation. Let £ { c -^>, ^^^j. Given two finite sets 
of instances X and J , we write X -^J when, intuitively, 
J is a concise representation (up to equivalence) of all the 
instances J such that 7 -^->J for some I £X. More formally, 
we let I -^rj when J7 can be returned by the following non- 
deterministic algorithm: 

Start with J := X. 
Repeat (until fixed point) 

If there is an instance J such that: 
(1) 7 -^*e J for some 7 £ I; and 

(2.1) we are in the case = '-^^ and 
there is no J' £ J such that J |= J'; or 

(2.2) we are in the case -^-> = and 
there is no J' £ J such that J' [= J 

Then let J := J U {J} 
Return J . 

Note that the above definition can easily by translated into 
an actual algorithm since the set of instances 7 satisfying the 
point (1) is finite up to isomorphism (and can be enumerated 
in exponential time). We define a complete derivation for 
X and — 2-> as an infinite series (Xi)i>o where Xo = X and 
Xi for each i > 0. We finally let J £ Fix(X,-^) 

when J' = [j i Xi for some complete derivation (2i)j>o. 



Theorem 2. For set E o/ tgds and egds, and for all 
sets T> and Q of instances: 

• Every U £ Fix(X>, c -^»E) is a rfttta rewriting of (T>, E). 

• Every 1Z £ Fix(Q, ^^^e) is a query rewriting of (E, Q). 

Proof Sketch. The result would follow directly from 
Theorem 1 if we had removed the requirements (2.1) and 
(2.2) in the above definition of saturation. To prove that 
the result still holds with (2.1) and (2.2), it is enough to ob- 
serve that the chase and the resolution are both monotone: 

• If J \= J' and J' c — — >s K ' , then there exists an instance 
K such that J C -^ E K and K \= K' . 

• If J' \= J and J' '^^s K' , then there exists an instance 
K such that J K and K' (= K. 

The monotonicity of the chase is well-known (see e.g. ,20. ). 
For the monotonicity of the resolution, consider J' \= J and 
ji i5!i. E j£> The case where K' is obtained from J' with 
the resolution rule (ren) is straightforward. Consider now 
the case (tgd) and unfold the definition of resolution so that 

K' = (J' \ H[9x,9z]) U B[9x,9y]- 

Let h be a renaming of Vj such that J[h] C J'. If J[h] does 
not intersect H[9x,9z] we have J [ft] C JT' and the property 
holds for K = K'. Otherwise, considering the instance 

K=({J[h])\H[9x,9z])UB[0 x ,0r]. 

we can check that 7 '-^^--e 7 [ft] :: ^e 7f and 7^ [ft] C 7f'. 
Consider finally the case (egd) and unfold the definition of 
resolution for an egd B(x,y,Z) — ^ a; = y of E and three 
substitutions S^, (9 H , For each atom A £ J', let c(A) be 
the atom of K' that has been obtained from A by renaming 
some occurrences of 9 x (x) into 9 y (y), and observe that: 

7f' = U{c(A)\A £ f}uB[9 x ,9 y ,9z]. 

Let ft be a renaming of Vj such that J [ft] C J' and let 

K = U{c{A)\A £ J[h]} U B[9 x ,9 y ,9 z ]. 

As in the previous case, we can finally check that we have 
7 ^ E 7 [ft] K and Jf [ft] C K' . This concludes the 

proof of monotonicity and the proof of Theorem 2. □ 

Termination. Let (Xi)i>o be a complete derivation of X 
and We say that this derivation terminates iff there 

exists a finite i such that h+\ = 7^. Note that, in this 
case, we also have 7j = U for all j > i. We finally say that 
Fix(X, -^x.) is finite when all the instances J G Fix(I, -^->s) 
are finite, meaning (equivalently) that all the derivations of 
X by — ^e are finite. 

Theorem 3. Given a finite set E of tgds and two finite 
sets T> and Q of instances, the following statements hold: 

• If there exists a finite data rewriting for (T>, E), 
then Fix(2?, c ^^E ) is finite. 

• If there exists a finite query rewriting for (E, Q), 
then Fix(Q, '-^^e ) is finite. 
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Proof Sketch. The two points are similar and we prove 
here the second one. Suppose that there exists a finite query 
rewriting 1Z for (E, Q) and a consider a complete derivation 
(Ii)i>o for Q and ^^^e- Since 1Z and U£T; are two rewritings 
of (E, Q), we have 7Z = UiTi. Since 1Z is finite, there exists 
a finite k such that 1Z = T^. We can then observe that 
Ik = Ik+i and conclude that (Xi)i>o is finite. □ 

Note that a direct consequence of Theorems 2 and 3 is 
the following: as soon as there is one finite fixed point in 
F'\x(D, c ^>s) or Fix(Q, ^^^s), it is the case that all the fixed 
points are finite. In particular, this means that all saturation 
strategies are equivalent with respect to termination, both 

.^ £ chase, i resol , 

m the case ol — > and — >. 

3.3 Rewritable Classes of Tgds 

This section illustrates how the resolution procedure from 
Section 3.1 can be used to 'simplify' the big picture on query 
rewritability. In particular, we first show that it provides 
a concise proof of finite query-rewritability of a few well- 
known classes of dependencies. In fact, the results stated in 
the following proposition are rather well-know. It is however 
interesting to compare the following proof sketch (based on 
resolution, and arguably simple) with, for example, the sem- 
inal paper of Johnson and Klug [18] where a proof was given 
(based on the chase, and arguably complex) for the tractabil- 
ity of inclusion dependencies. This class of dependencies is 
indeed covered (strictly) by the class of local- as-view tgds 
(lav tgds) defined in the following proposition. 

Proposition 1. In each of the following cases, we can 
effectively compute a finite query rewriting for (E, Q): 

• (Lav tgds) E is a set of tgds B —¥ H where the body B 
contains at most one atom. 

• (Lossless tgds) E is a set of tgds B — > H where, for 
each atom Ah £ H , we have Vb Q Va k ■ 

• (Acyclicity) E is a set of tgds and there is a linear 
order <e on the predicates of E such that, for all tgd 
B — s> H in E, all predicate Rt occurring in B, and all 
predicate Rh occurring in H, we have Rh < Rb- 

Proof Sketch. In the case of lav tgds, we can observe 
that the resolution rules (horn) and (tgd) never increase the 
number of atoms. More formally, whenever I '-^h-s J, we 
have | J | = |/|. There are therefore a finite number of in- 
stances (up to =) that can be computed by resolution. As 
a consequence, every (complete) derivation terminates and 
every 7Z £ Fix(7J>, :2 ^e) is a finite query rewriting. In the 
case of lossless tgds, the result follows from a very similar 
observation: the resolution rules (hom) and (tgd) never in- 
crease the number of variables. (That is, I ! ^Ve J implies 
Vj C V/). Consider finally the case of an acyclic set E of 
tgds. Let <te be the set of predicates occurring in E and con- 
sider the ordering as = {Ri, ■ ■ ■ ,R n } where Ri-i <e Ri for 
each i < n. For each instance 7, consider s(I) — (ai, . . . , a n ) 
where, for each i £ {1, .., n}, at is the number of atoms 
R'(v) £ I where R' = Ri. We can observe that, whenever 
j ^ ne t U pi e s (j) j s smaller than s(7) with respect 

to lexicographic order. We can finally conclude as in the 
previous cases. □ 



We next shed more light on the notion of stickiness from 
[11] which was discussed in the introduction. In particular, 
we show that there is in fact a direct reduction (preserving 
the property of finite rewritability) from the class of sticky 
tgds to the class of lossless tgds. This reduction proves use- 
ful in two ways: (1) it provides a direct proof of rewritability 
for the class of sticky tgds (which, unlike QT], does not re- 
quire a custom resolution algorithm), (2) and it also allows 
us to identify a more general class of rewritable dependen- 
cies. In a nutshell, the key idea of the following reduction 
consists in replacing an atom A(x, y) by an atomic formula 
R(x) « (3y,A(x,y)) whenever A(x,y) is the only atom (in 
a given tgd) where the variables of y occur. 

Simplifying atoms. Given a tgd r : B-^-H and an atom A 
in the body B, we let Xa,t = VaHVh and Ya.t = Va\Xa,t- 
We then say that A can be simplified in r when Ya,t is non- 
empty and disjoint from Vb\a- For example, in the tgd 

A(x 1 ,y 1 ),B(x 1 ,x 2 ,y 2 ),C(x 1 ,y 3 ),D(y 2 ) -> R(x 1 ,x 2 ,z 1 ) 

the atoms A(x\, yi) and C(x\,yz) can be simplified. In con- 
trast, B(xi, x 2 ,y 2 ) cannot be simplified because y 2 occurs 
in an other atom of the body. 

Consider now a set E of tgds, a tgd r:B^H in E, and an 
atom A which can be simplified in r. Given a tgd r':B'^H' 
in E' and an atom A' € H' we say that r' unifies with (A, r) 
when there exists a renaming 9\ of Va and a renaming 9 2 
of V B > n V H / such that A[0i] G H'[8 2 ], in which case (#i,6> 2 ) 
is called a unifier of r' and (A, r). The following algorithm 
describes the new set of tgds T,A,r that results from the 
simplification of (A, r) in E: 

Let x = (xi, . . . , x n ) be an ordering of Xa.t = {x} 
Let R a be a fresh predicate of arity n 
Replace A by R a (x) in the body of r 
For all tgd r':B'^H' in E, including r' = r 
For all unifier (6\,6 2 ) of r' and (A,r) 
Add the tgd B'[6 2 ] -> R a (6 1 (x)). 

Note that the quantification of the variables may be modified 
in this process, and new simplifications may therefore be 
possible in T.a,t- For instance, the set of tgds 

ri : A(x,x,y,z,t) -> B(x,y) 

r 2 : C(x, y) -> 3u, v, A(x, y, u, v, v) 

r 3 : D(x, y, z, t) -> A(x, x, y, z, t) 

will, after a simplification step in r%, be replaced by 

r[ : R a (x,y) -> B(x,y) 

r 2 : C(x, y) 3m, v, A(x, y, u, v, v) 

r 3 : D(x, y, z, t) -> A(x, x, y, z, t) 

r' 2 : C(x, x) — y 3u, R a (x, u) 
r' 3 : D(x,y,z,t) -s> R a (x,y) 

and the first atom of r' 3 can now be simplified (even though 
this atom could not be simplified in r 3 ). 

Despite the previous observation, we can check that the 
process of simplification always terminates since each step 
introduces only a finite number of tgds, and the number of 
variables in each of these tgds is strictly decreasing. For a 
finite set E of tgds, we finally define (non-deterministically) 
the set E4_ as a set of tgds obtained from E by repeating the 
operation of simplification until a fixed point is reached. 
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Theorem 4. For all finite sets E oftgds and all finite sets 
Q of instances, there is a finite query rewriting for (E, Q) 
iff there is a finite query rewriting for (E4-, <2) • 

Proof Sketch. Consider a series of simplification steps 
Sl , . . . , s n where each s; is characterized by the atom Ai(xi, yt) 
that has been simplified at step Si and the corresponding 
atom Ri(xi) that has been introduced. For each i, con- 
sider the tgd ri : Ai(xi,y~i) — > Ri(xi). Finally, let Y = 
{fi}ig{i .. n }- We can observe that, for all instance D over 
the original schema, and every data rewriting Dr for (D, F), 
we have D A E \= Q iff D r A EJ. \= Q. Since, T is a set 
of Datalog rules, there is a data rewriting Dr which is fi- 
nite iff D is finite. We can also observe that the set of tgds 
E _1 = {Ri(xi) —¥ 3y~iAi(xi,yi)} is acyclic, and for all in- 
stance D' of the extended schema, there is therefore a data 
rewriting D r _i of (D',E _1 ) which is finite iff D' is finite. 
With letting II be the operation that projects an instance 
of the extended schema on the original schema, we can then 
observe that, for all instance D of the original schema, we 
have II((Dr)r-i) = D. This means intuitively that there 
exists a one-to-one correspondence (which preserves finite- 
ness) between the instances of the original schema and the 
instances of the extended schema. This is the key argument 
behind the proof of Theorem [4] □ 

Stickiness (Slightly Revisited). Given a set of atoms B 
and a term t, we denote by pos(t, B) the set of pairs (R, i), 
called positions, such that B contains an atom R{t\, . . . , t n ) 
where ti = t. Given a set of tgds E, a tgd B^H in E and 
an atom A G H , the tgd B — > A is called a global-as-view 
projection of E, denoted r' G Gav(E). Given a set of tgds 
E, we define the set At. of affected positions as the smallest 
set of positions such that, for all tgd r G Gav(E) of the form 
r : B{X, Y) -> H(X, Z), we have: 

(i) Vu G Y,pos(v,B) C As 

(ii) Vu G X, (pos(u,H) C As) => (pos(u,B) C As) 

We say that E is sticky iff, for all tgd r G Gav(E) of the form 
above, and all u G X such that pos(u, B) C As, the variable 
u occurs in only one atom of the body. This definition differs 
from [TT] because of this last requirement "in only one atom". 
In contrast, the definition from requires that u occurs 
"only once" (in only one atom and in only one position). 

Theorem 5. If E is a sticky set of tgds, then Ej- is a 
set of lossless tgds, and therefore, for all set Q of instances, 
there is a finite query rewriting for (E, Q). 

Proof Sketch. It can be checked that, for every sticky 
set E of tgds, and every atom A that can be simplified in 
a tgd r G E, the set Ea,t resulting from the simplification 
of (A, r) is also a sticky set of tgds. Assume now that E4, is 
sticky and EJ, contains a tgd r : B —¥ H which is not lossless. 
Since r is not lossless, there is an atom Ah G H and an atom 
A b G B such that VA b % Va h ■ Observe that the tgd B A h 
belongs to Gav(E4-) and consider a variable v G VA b \VA h - 
By definition of Asi, this variable v occurs in an affected 
position, and the stickiness assumption ensures that v occurs 
only in the atom Ab, meaning that v Vb\a 6 ■ If follows that 
At can be simplified in r, and this contradicts the definition 
of Ej-. Therefore, every tgd in E4, is lossless. □ 

We can observe that the (revisited) notion of stickiness 
is simultaneously a strict generalisation of: (1) the class of 



lossless tgds; (2) the original notion defined in [TT] from 
which it is inspired; and (3) the class of lav tgds, which was 
not yet covered by (2). Note also that stickiness could be 
combined with the (incomparable) notion of acyclicity dis- 
cussed in Proposition [JJ and/or the class of sticky-join tgds 
introduced in [TT] to design an ever larger class of tractable 
settings. However, this is left as future work. 

3.4 Integrating the Egds 

This section provides a negative result on egds and query 
rewriting which will motivate two further contributions: (1) 
a novel technique, also in this section, that allows the inte- 
grating of some egds in a set of tgds; and (2) the study, in 
Section 4, of a richer notion of rewriting based on Datalog. 

While the completeness result from Section 3.1 remains 
of clear interest with both tgds and egds, it turns out that 
the notion of query rewriting from Section 3.2 is in fact 
very limited under egds. Intuitively, this is because we con- 
sidered a notion of first-order rewriting, while dealing with 
egds often requires the power of second-order (or, as we will 
see, the use of some integration technique which extends the 
schema). In fact, as illustrated by the following proposition, 
(E, Q) is rarely rewritable under egds, even in the case where 
E consists of a single egd. 

Proposition 2. There is no finite query rewriting for 

E = {A(x,y),A(x,y')->y = y'}, and 
Q = {R{z,z)}. 

Proof Sketch. An infinite rewriting for (E, Q) is the 
set of instances 1Z = {R n } n >i where each Ri is equal to 
{R(xi,x n )} U {A(xi,yi), A(x i+1 ,yi) | i < n — 1}. We can 
check that this rewriting 1Z is not equivalent (up to =) to any 
finite set of instances. Therefore, there is no finite rewriting 
for (E, Q). □ 

Despite the above result, it has been observed in [22] that 
there are practical scenarios where egds can be 'handled' 
with a first order language. It is indeed possible, in some 
cases, to integrate these egds in the given set of tgds, as 
to compute a new set of tgds which, intuitively, does not 
interact with these egds. As a special case, this approach 
based on integration, covers the scenarios where the given 
sets of tgds and egds are already non-conflicting, as defined 
in [TS] or [5] . However, we will also capture scenarios where 
the original set of tgds properly interacts with the egds. 

As in [25] or [5], we next focus on functional dependencies 
rather than arbitrary egds. The reason for this is that the 
egds used in practice often consist of functional dependen- 
cies, and the functional dependencies have a more specific 
syntax which proves more convenient (in the context of in- 
tegration). Recall that a functional dependency is a rule 
of the form R a \_K a \ — » l a where R a is a predicate of ar- 
ity a a , K a C {1, . . . , a a } and l a G {1, . . . , a a } \ K. Given 
an instance M, we then let M \= a when, for all atoms of 
the form R a (xi, . . . , x aa ) and R a (x'i, . . . , x' aa ) in M, there 
either exists some k G K a such that x' k ^ Xk, or it holds 
that x[ a = xi a . It is clear that a functional dependency 
can always be expressed by an equivalent egd. For example, 
for a binary predicate A, the dependency a : A\\\ — > 2 is 
equivalent to the egd A(x, y), A(x, y') — > y = y' . 

Integration Heuristic. Given a set E of tgds and a 
functional dependency a : R a YK a \^rl a , we let E a be the 
set of tgds obtained as follows: 
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Start with E Q := E 

Let D a and F a be two fresh predicates 
Let ii , . . . , i n be an ordering of K a 
Add to E a the two following tgds: 

Ra (si, . . . )3Ja a ) ^ F a (jC^ , . . . , Xi n , Xl a ) 

D a (x!, . . . ,x„) -s- 3y, F a (xi, ... ,x„,y) 
For all tgd r:B -> H in E 

For all atom R a (ti, . . . , t aa ) in H 

C) If {in,... ,ti„} C Vb and U a ft Vb 

Add the atom F a (ti ± , . . . , ti n ,ti a ) to the body of r 
Add the tgd B -»■ Dq,^, . . . ,t in ) to E a . 

We say that a interacts with E when the lines below (*) 
in the above algorithm are actually used. That is, when 
there is a tgd r:B^H in E and an atom A £ H of the form 
R a (ti, . . . ,ta a ) such that {ti\i £ K a } C Vb and t ta £ Vb- 
Note in particular that a does not interact with E when a 
is non-conflicting with E according to the definition of [9] 
(which would requires here that {ti\i £ K a } % Vb). 

Definition 3. We say that the integration of a succeeds 
in a set of tgds E iff the set of tgds E a is such that: 

• a does not interact with T a , and 

• for alltgdB'^H' mE„, B' |= F a |l, • • • , nj->(n+ 1). 

Lemma 1. If the integration of a succeeds in E then, for 
all instances D and Q over the original schema such that 
D \= a, the following statements are equivalent: 

• DAEAq^Q 

• D A E a |= Q 

Proof Sketch. Let U £ Fix(D, c ^>E Q ) and recall from 
Theorem 2 that U is a data rewriting for (D, E a ). Under 
the assumptions that D \= a and a does not interact with 
E a , we can check that U \= a. It follows that U is also a 
data rewriting for (D, E Q A a). We can finally check that 
the following statements are all equivalent: D A E a |= Q; 
U \= Q; D A E a A a |= Q; and D A E A a \= Q. □ 

Consider now a set of functional dependencies T and a 
set E of tgds. We say a set of tgd Ejf of tgds integrates 
T in E iff there exists a series ai, ■ ■ . ,a n € J- and a series 
Eo , Ei , . . . , E n such that: 

• Eo = E, Vi E i+ i = (Ei) Qi and E n = Ejf; 

• for all i, the integration of <%i succeeds in E;; and 

• there is no remaining a £ T that interacts with Ejr. 

We are now ready to formalize the property of interest 
which is ensured by the integration heuristic. 

Theorem 6. Given a set of tgds Ejr that integrates a set 
of functional dependencies J- in a set of tgds E, for all in- 
stances D, all data rewriting Djr of {D,T), and all sets Q 
of instances, the following statements are equivalent: 

• DAEAJ|= Q 

• Dj- A Ejr |= Q 

Proof Sketch. The result can be proven by induction 
on the cardinality of T using Lemma Q] and the result of 
separability which was established in [8]. □ 



Note finally that, since J 7 is a set of functional depen- 
dencies, we can compute a data rewriting Djr for (V, T) 
in polynomial time (data complexity) using any standard 
chase procedure. Combining Theorem [6] with the results of 
the previous section, we finally get the following result: 

Corollary 1. Given Ejf that integrates T in E, if the 
set T,jr is sticky, for all set Q of instances, the following 
problem is Ptime: given an instance D, does DAEAJ-" \= Q? 

We finally provide an example of scenario taken from [22] 
which is covered by the approach described in this section: 

E = {A(x, y) ->■ 3z, B(x, z) A C(z, y)} 

T={a: B[l\ -> 2} 

' B(x,y) -> F a (x,y) 
_ I D(a;) -)■ 3y, F a (x,y) 
^ T - \A{x,y)^D a {x) 

A(x,y)AF ) ->■ B(x,z) AC(z,y) J 

3.5 Intermezzo: Constants and Free Variables 

The goal of this section is to show how the previous results 
can be applied to more realistic databases (with constants 
and nulls) and non-boolean queries (with free variables). 

3. 5. 1 Hard and Soft Constants 

A database D is a set of atoms R(ti, . . . , t n ) where each 
term t% is either a variable (also known as a labelled null) or 
a constant from a finite set A = Ah W A s where: is a set 
of hard constants which are subject to the standard unique 
name assumption (UNA); and A s is a set of soft constant 
which are not subject to the UNA (see e.g. [20]). Given a 
database D we denote by D* the instance obtained from D 
as follows: (1) rename every c £ A into a variable v c ; (2) 
for every c £ A introduce a fresh predicate R c and add the 
atom Rc(v c ); (3) introduce a fresh predicate i?= and, for all 
c, c £ Ah such that c 7^ c', add the atom R^t (v c , v c i). This 
definitions correspond to the standard encoding of constants 
and it can similarly be applied to boolean queries with con- 
stants. The following properties are then readily verified: 
(1) given a database D and a set E of tgds and egds, DAT, 
is satisfiable iff D* A E ^ {R^{x, x)}; and (2) when D A E 
is satisfiable, for all set Q of instances, we have D A E |= Q 
iff D* A E |= Q*. A less obvious observation, formalized 
below, is that this technique of simulation can also be used 
for query rewriting under integrable egds. 

Proposition 3. 

• Given a set of tgds E, an integrable set of functional 
dependencies T , and a database D, the formula 
D A E A T is satisfiable iff D* A Ejr ^ {R^(x, x)}. 

• Under satisfiability, for all databases D, all data rewrit- 
mgs D* T of (D*,Ejr) and all sets Q of instances, we 
have D A E A T |= Q iff D* T A E^ |= Q* . 

Corollary 2. Given a set E of tgds and an integrable 
set J- of functional dependencies, if Ejf is sticky, then, for 
all set Q of instances, the following problem is in Ptime; 
given a database D, does D AT A F \= Q? 
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3.5.2 Free Variables and Equalities 

Recall that a query Q is called a union of conjunctive 
queries with equalities, denoted Q 6 UCQ = , when Q is a 
first-order query of the form 

Q = {(x 1 ,...,x a )\\J j Q J } 

where each Xi is a called a free variable and each clause Qj is 
a finite conjunction of relational atoms and equality atoms 
(with constants, free variables, and existential variables). 
Given such a query Q, we may consider the set Q* of in- 
stances obtained as follows: (1) rename every constant c into 
a variable v c and add the atom R c {v c ) to each clause; (2) for 
every free variable Xi, introduce a predicate Vi and add the 
atom Vi(xi) to each clause; (3) introduce a predicate R = and 
replace every equality atom (t = t') by the atom R=(t,t'); 
and (4) let Q* be the resulting set of clauses (which now con- 
sist of instances with variables only). Conversely, given an 
instances R* of the extended schema (and for a fixed tuple 
(xi, . . . ,x n ) of free variable) we let R the set of relational 
and equality atoms over the original schema which is ob- 
tained by replacing every atom R c {u) by (u = c) and every 
atom Vi(u) by u — Xi. Given a set TV of instances over the 
extended schema, we finally denote by TZ the UCQ = query 
of the form TZ = {(xi,. . . , x a ) | \/{R\R* € TV}}. These def- 
initions are exemplified below: 

S = {A(u,v) -> B(u,u)} 
Q= {(x 1 ,x 2 )\B{x 1 ,x 2 )} 

Q* ={{B(x 1 ,x 2 ),Vi(x 1 ),V2{x 2 )}} 
TZ* =Q*U{{A(u,v),Vi(u),V2(u)}} 

TZ — {(xi, x 2 ) | B(x\, x 2 ) V (3m, v, A(u, v)Axi=uAx 2 =u)}. 

As next formalized, the above technique can be used to 
generalize the results of the previous sections to the case of 
non-boolean queries with constants and equalities: 

Proposition 4. Given T, Q, Q* ,TV ,TZ as above where 

TV e Fix(Q*,^> E ) 

the UCQ = query TZ is a rewriting of the UCQ = query Q. 
More formally, for all database D and all tuples c of con- 
stants, the following statements are equivalent: 

• c is an answer of TZ in D, denoted c € TZ(D) 

• c is a certain answer of Q in D under E, meaning that 
c e Q(D') for all instance D' such that DAT, \= D' . 

Corollary 3. We can use the technique of saturated res- 
olution to compute a finite UCQ = rewriting for a given UCQ = 
query whenever there exists one (e.g. under stickiness). 

4. DATALOG REWRITING 

As announced in the introduction, we now consider a 
more general notion of rewriting, called Datalog rewriting, 
which will prove to be a unifying paradigm of tractability 
for Datalog* . More precisely, we will show that it captures 
the class of terminating dependencies (Section 4.1) and the 
class of guarded tgds (Section 4.2). 

Definition 4. Given a set E of tgds and egds, and a set 
Q of instances, a Datalog rewriting for (E, Q) is a triple 
(ox, T, G) where 



• oa is a set of predicates which do not occur in E or Q; 

• F is a finite set of tgds B^H where Vh C Vb; 

• G is an instance of the form G = {G()} where G G oa; 

• for all OA-free instances D, it holds that: 

D AT, \= Q iff D AT \= G. 

Note that, in the above definition, each tgd in T corre- 
spond to a standard Datalog rule (also known as a full tgd) . 
Since V is required to be finite, V corresponds to a standard 
Datalog program. A predicate of oa will be called an aux- 
iliary predicate and oa corresponds intuitively to an inten- 
tional schema. An OA-free instance is defined as an instance 
in which no predicate of oa occurs. That is, an ox-free in- 
stance corresponds intuitively to an extensional database. 
The instance G is finally known as the goal of the Datalog 
program (oa, T, G) and the predicate G, of arity 0, is known 
as the goal predicate. The following proposition finally sum- 
marizes the basic properties of Datalog rewritings: 

Proposition 5. 

• If a Datalog rewriting exists for (E, Q), the following 
problem is in Ptime: given D, does D A E \= Q? 

• If there is a finite query rewriting for (E, Q) then there 
is also a Datalog rewriting for (E, Q). 

• There are some pairs (E, Q) for which a Datalog rewrit- 
ing exists while no finite query rewriting exists. 

Proof Sketch. The first point follows from the follow- 
ing observation: when T is a set of full tgds, we can compute 
a data rewriting U for (D, V) in polynomial time (for a fixed 
r) using the chase, and we can then test in polynomial time 
(for a fixed instance G) whether U \= G. For the second 
point, given a finite query rewriting TZ for (E, Q) and with 
letting G = {G()} for some fresh predicate G, we can ob- 
serve that ({G}, {R — > G}r^iz,G) is a Datalog rewriting 
for (E, Q). Finally, for T={R(x, y), R(y, z)->R(x, z)} and 
Q = {R{x,x)}, the pair ({G}, E U {R(x, x) ->■ G},G) is a 
Datalog rewriting of (E, Q) while there is no finite (first- 
order) query rewriting for (E, Q). □ 

4.1 From Termination To Datalog 

This section revisits the criterion of oblivious termination 
which was introduced in [20] and presented in [10] as a lan- 
guage of the Datalog* family. As discussed in [TO] , there are 
alternative criteria of termination that can be considered 
(see [26] for the current the state of the art). Note how- 
ever that oblivious termination captures the case of weakly 
acyclic [15] sets of tgds and arbitrary sets of egds, and the re- 
sults presented in this section can be extended to the classes 
discussed in |26| . In a nutshell, oblivious termination is 
based on (1) a technique of simulation that encodes the egds 
by means of tgds and (2) a standard notion of skolemization 
which generates a set of rules with function symbols (that 
is, a logic program). As observed in [20], this logic program 
enjoys a technical bounded depth property. In turn, we will 
show in this section that this bounded depth property en- 
sures the existence of a Datalog rewriting. 

Simulation. Given a set E of tgds and egds, we say 
that E' is a substitution-free simulation of E, denoted E' £ 
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SirriE (5D) , when E' is a set of tgds obtained from E using 
the simulation technique from [20] (which, unlike alterna- 
tive techniques, avoid the use of substitution axioms). More 
precisely, we let E' £ SirriE(E) when E is a binary predicate 
which does not occur in E and E' can be computed with the 
following non-deterministic algorithm: 

Start from E' = E. 
Add the following tgds to E': 
E(x,y) E(y,x) 
E(x,y),E(y,z) -> E(x,z) 
For all predicates R occurring in E (of arity n) 
Add the following tgd to E': 

R(xi, . . . ,x n ) -s> E(a;i,a;i), . . . , E(x n ,x n ) 
Repeat (until fixed point) 

If there is a tgd B— or an egd B~^x=y in E' and 
a variable x £ Vb that occurs more than once in B 
Let x' be a fresh variable 
Replace one occurrence of x by x' in B 
Add the atom E(:r, x') to B 
For all egds r : B — > x = y in E' 

Replace r by the tgd B — > E(x, y) 
Return E'. 

Skolemization. Given a set of tgds E, we denote by Pe 
the logic program which is obtained by skolemizing E is a 
standard way. That is, for all B(X, Y) -> H(X,Z) in E, 
the program Pe contains a rule B(X, Y)—>H'(X) where H' 
is obtained from H by replacing every variable z G Z by a 
term f(x) where / is fresh function symbol and the tuple x 
is a fixed ordering of A = {x}. Given an instance I and such 
a logic program Pe we then denote by Pe (/) the fixed point 
of / and Pe (also known as the minimal Herbrand model) . 

Definition 5. 

• We say that a set E of tgds ensures oblivious termina- 
tion iff, for all finite instance I, Ps(I) is finite. 

• Given a set E of tgds and egds, we say that E ensures 
oblivious termination iff there exists E' € SirriE(E) 
such that E' ensures oblivious termination. 

Bounded Depth. Given a skolem term t with vari- 
ables and function symbols, we define the depth d(t) of t 
in a standard way. More precisely, given a variable x we 
let d(x) = 1, and given a term t = f{ti, . . . , t„), we let 
d(t) = 1 + max;< n (d(fi)) . Given a set of tgds E, an instance 
D and an integer k we denote by P£(D) the set of atoms with 
skolem terms which is obtained by applying (inductively) all 
the rules B(X,Y) -> H'(X) of P E , but only for the valua- 
tions 9 of X U Y such that, for all u £ X U Y, the depth of 
8(u) is at most k. It can be checked that P^(I) is finite and 
well-defined whenever E, I and k are finite. In particular, 
the order of application of the rules does not matter. Note 
here that P^(I) is a skolem instance (with skolem terms) 
rather than a standard instance (with variables only) but he 
definitions from Section 3.1 can nonetheless be extended in 
a natural way. In particular, given a set Q of instances , we 
let P£(I) |= Q iff, for all Q £ Q, there is a mapping 6 from 
Vq to the terms of P|(7) such that Q[9] C P£(I). 

Definition 6. Given a set E of tgds and a set Q of in- 
stances, we say that (E, Q) has bounded depth iff there ex- 
ists a finite integer k (depending only on (E,Q)) such that, 
for all instances D, the following statements are equivalent: 



• D AT, \= Q 

• P£(D) h= Q 

The following result was established in [2U] : 

Lemma 2. If E is a finite set of tgds ensuring oblivious 
termination, there exists k ( depending only onY,) such that, 
for all instances D, Pe(P) = P£(D). Therefore, for all sets 
Q of instances, (E, Q) has bounded depth. 

We next present the main result of this section. 

Theorem 7. Given a set E of tgds and a set Q of in- 
stances, if (E, Q) has bounded depth (and if we know the 
bound k), we can compute a Datalog rewriting for (E, Q). 

Proof Sketch. The key idea of the proof is to use fresh 
predicate symbols to simulate the effect of the function sym- 
bols of Pe- Intuitively, every atom R(t) with skolem terms 
can indeed be simulated with a standard atom R s (x), with 
variables only, where s encodes the "shape" of each term, 
while x corresponds to the variables that occur in t. For 
instance: 

R{x,f{x,y),g{y,.f{x,z})) ~ Rij(i,2), g {2j{i,3))(x,y,z) 

For a fixed bound k, for every rule B(X, Y) — ¥ H'(X) in Pe 
and for every valuation 6 of XuY such that max.{d(6(v))\v G 
X U Y} < k we can then translate every skolem atom in the 
rule B[0] — ¥ H[9] into a standard atom. For instance, if 
k = 2 and Pe contains only one binary skolem function /, 
we can replace the rule 

A(x,y) -> B(x,y,f{x,y)) 

by a set of Datalog rules containing, among others, the rules: 

A(x,y) -> Pl,2,/(1,2)(£,2/) 

Aij(i,i){x) -> Pi, /(i, /(i,i>) (a;) 
Ai, f{ i, 2 ){x,y) ->■ Pi,/(i,2),/(i,/{i,2)>(a;,y) 
Af { i i2 }j(i,3}(x,y,z) -)■ P/ 1 , 2 ,/(i,3>,/(/ 1 , 2 ,/(i,3»(a;,y) 

Consider now Q of the form Q = {G()} such that (E, Q) has 
bounded depth k. Let o~\ be the set predicates R s where s 
encodes a shape of depth < k. Let F fc be set of Datalog rules 
resulting from the above construction. We can check in this 
case that (<ta, E fc , {G()}) is a Datalog rewriting for (E, Q). In 
the general case, (when Q is not already of the form {G()}), 
we may introduce a fresh 0-ary predicate G, consider all 
the tgds tq : Q —¥ G() where Q £ Q, and consider all the 
valuations 9 of Xq such that max{d(6(x))\x £ Xq} < k. We 
can then encode each of these rules with a Datalog rule over 
o% U {G}, and we can conclude as in the previous case. □ 

Corollary 4. There is an algorithm that, given a set E 
of tgds and egds ensuring oblivious termination, and given a 
set Q of instances, computes a Datalog rewriting for (E, Q). 

4.2 From Guardedness To Datalog 

In this section, we consider the class of guarded tgds. In- 
tuitively, we will say that a tgd is guarded when there is 
an atom in the body (called a guard) that contains all the 
universal variables. In turn, a variable is called universal 
iff it occurs both in the body and the head. Note that the 
variables that occur only in the body are not taken into ac- 
count in this definition of guardedness. Therefore, the class 
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of guarded tgds contains, as a special case, the tgds B — > G() 
where B is an arbitrary instance and G is a 0-ary predicate 
(for example, a goal predicate) because such tgds have no 
universal variable. Another example of guarded tgd is 

A(x,y,z),B{i,x,y),B(j,y,z) ->■ 3k,B(k,x,z) 

where the set of universal variables is {x, z} and the atom 
A(x,y,z) is a guard. In contrast, the tgd 

B(i,x,y),B(j,y,z) -> 3k,B(k,x,z) 

in not guarded since there is no atom in the body that con- 
tains both x and z. 

Definition 7. A tgd of B — ► H is guarded iff there is a 
atom G G B such that (V B n V H ) C V G . 

4.2.1 The Case of /3-Guardedness 

A special class of guarded tgds was considered in [HI [TO] , 
called /3-guarded tgds in this section, that complies with the 
following definition: 

Definition 8. A tgd B — > H is called /3-guarded iff there 
exists an atom G £ B such Vb Q Vg- 

Note that a /3-guarded tgd is (only) a special case of 
guarded tgds since the requirement Vb Q Vg is stronger 
than (Vs n Vb) Vg- For example, the tgd 

A(x, y, z), B(i, x, y), B(j, y, z) -> 3k, B(k, x, z) 

is guarded but not /3-guarded. While /3-guardedness is slightly 
less general, it was shown in [8] that the class of /3-guarded 
tgds remains a very natural class to consider. In particu- 
lar, it was shown in [8j that we only need /3-guardedness (as 
opposed to general guardedness) to cover interesting classes 
of ontologies, including languages from the DL-lite family 
[3]. Another important advantage of /3-guardedness is that 
it ensures an useful property, established in [8j, called the 
bounded guard-depth property. This property (defined for 
/3-guarded tgds only) proves indeed very relevant here as it 
coincides in fact with the general property of bounded depth 
which was discussed in the previous section. Combining the 
results in [8j with this observation, we obtain the following 
results: 

Lemma 3. 7/E is a finite set of /3-guarded tgds and Q is 
a finite set of instance, then (E, Q) has bounded depth k for 
some computable k that depends both on E and Q. 

Corollary 5 (of Theorem [7]). There is an algorithm 
that, given a set E of /3-guarded tgds and a set Q of in- 
stances, computes a Datalog rewriting for (E, Q). 

As already discussed, a /3-guarded tgd is only a special 
case of guarded tgd, and the work of [5] suggests that there 
is no trivial reduction from the class of guarded tgds to the 
class of /3-guarded tgds. This is why we consider an alterna- 
tive approach, in the following section, for the more general 
case. 

4.2.2 From Guardedness To Flatness 

In this section, we provide a proof (sketch), based on a 
technical notion of flatness, for the following result: 

Theorem 8. For every finite set E of guarded tgds and 
set Q of instances, there is a Datalog rewriting for (E, Q). 



The key idea of the proof can be summarized as follows: 
when E is a set of guarded tgds, there exists an equiva- 
lent set of tgds E' which enjoys the flat chase property. This 
property meaning intuitively that it is sufficient to chase the 
tgds B(X,Y)^H(X,Z) of E' for the renamings of X that 
map X to the variables of the original instance D (which 
intuitively correspond to constant values). In turn, the flat 
chase property can be linked with the property of bounded 
depth (for the depth k = 1) and Theorem can be used 
again to prove the existence of a Datalog rewriting. 

Flat Chase. Consider a triple (D, I, J) of instances and 
a set E of tgds. When I '-^^e J we say that the chase step 
is flat with respect to D, denoted I — Ye J, when there is a 
tgd B(X,Y)—yH(X,Z) in E and two substitutions 9x and 
9y such that: 

• B[9 X , By] C I and J = I U H[9 x ,9z] for some /-fresh 
renaming 9z\ and 

• in addition, for all a; € X, it holds that 9x(x) G Vd- 

Given an instance U, we finally let U G Flat(D, E) when 
there exists a finite derivation of the form 

D = h ^ h -^s ■ ■ ■ -^s In = U. 

Definition 9. Given a set E of tgds and a set Q of in- 
stances, we say that (E, Q) has the flat chase property iff, 
for all instance D the following statements are equivalent: 

• D AE |= Q 

• 3U e Flat(D, E), U \= Q 

Lemma 4. //(E, Q) has the flat chase property, then (E, Q) 
has bounded depth (for the depth k = 1) and therefore, there 
exists a Datalog rewriting for (E, Q). 

The proof of Theorem |8j finally relies on Lemma [5] below. 

Lemma 5. For all finite set E of guarded tgds and all fi- 
nite set Q of instances, there exists a finite set E' of tgds 
such that E = E' and (E', Q) has the flat chase property. 

Proof Sketch. Given a tgd r : B{X,Y)^H(X,Z), a 
refinement of r is a tgd r' = B'—tH' where B' = B[9x,9y] 
and H[9x,9z] for some renamings Ox, 9y ,9z where 9z is 
a S'-fresh renaming of Z. We say that r' is a careful re- 
finement of r when, in addition, 9y is a iZ'-fresh renaming. 
Given a set of tgds E, we let Ref(E) (resp. CRef(E)) the 
sets of tgds corresponding to a refinement (resp. careful re- 
finement) of some tgd of E. Given a tgd r : B — > H we say 
that r is of the split form 

r : G(X, U), B'(X', U' , V) ->■ H(X, Z) 

when X and Z are defined as usual, G is a subset of B 
satisfying X C Vg, U is the set of remaining variables in G, 
B' is the set of atoms in B\G, V is the set of variables that 
occur only in B', and finally (X' ,Y') — (X f] V B > , Y n Vb' ) ■ 
Given two sets Ei and E2 of tgds, we say that a tgd r$ is 
derived from (Ei, E2) when there exists a tgd n G CRef(Ei) 
and a tgd r-z G Ref(E2) such that: 

• ri : B 1 (X 1 ,Y 1 ) ^ H 1 (X 1 ,Z 1 ); 



10 



• r 2 : G 2 (X 2 , U 2 ) , B' 2 (X 2 , U' 2 , V 2 ) -»• 77 2 (X 2 , Z 2 ) 

where : 1. G2 is non-empty and contained in 77i 

2. (X 2 U U 2 U V 2 ) is disjoint from (Zi U Fi) 

3. Z 2 is disjoint from (X1UY1UZ1); and 

. r 3 = (B 1 UB 2 ) -> (77iU77 2 ). 

It follows here from 1. and 2. that X 2 U [7 2 C Xi while V2 
is disjoint Y\. Therefore, the tgd r 3 is of the slit form 

r 3 : G3(X 3 ,U 3 ),B' 3 {X' 3 ,ll),V3) -»• 77 3 (X 3 ,Z 3 ) 

where G 3 = Si, X 3 = Xi, [/ 3 = Yi, B' 3 = B 2 , X^ = X 2 Ut/ 2 , 
V 3 = V 2 , 77 3 = -ffi U 77 2 , and Z 3 = Z x U Z 2 . 

Fact 1. When a tgd r 3 is derived from (Ei, E 2 ), it holds 
that Ei U E 2 |= r 3 

Proof Sketch. Using the previous notations, we have 

(Si UB' 2 ) c ^ {ri } (Bi UB 2 Utfi ) c ^> {r2 } (Si ub: 2 uHjuh 2 ) 

and it follows that {ri,r 2 } |= r 3 . 

Given an integer k we say that a tgd r is fc-guarded if r is 
of the form 

r : G(X, U),B'(X', U', V) -> H(X, Z) 

for some instance G with only one atom (that is, a guard 
atom), and there exists a set of instances (B l )i<„ called a 
(G,k)- decomposition of B such that:(l) B' = Uj< n (B*), (2) 
for each i < n, B l has at most (k — 1) atoms; and (3) for all 
i =fc j, it holds that V B i D V B j C Vg- We define the guard 
width of a guarded tgd r, denoted gw(r) as the smallest 
A: such that r is fc-guarded. In contrast, we define the left 
width of a tgd r, denoted lw(r), as the number of atom in 
the body of r. Note that, every guarded tgd is such that 
gw(r) < lw(r). Given a set of tgds E, we finally let gw(E) = 
max{gw(r), r € E} and lw(E) = max{gw(r),r £ E}. 

Fact 2. When a tgd r 3 is derived from (Ei,E 2 ), it holds 
that gw(r 3 ) < max(gw(Ei), lw(E 2 )). 

Proof Sketch. It can be checked that Iw(r') < lw(r) when- 
ever r' is a refinement of r and gw(r') < gw(r) whenever 
r' is a careful refinement of r. Let n 6 CRef(Ei) be a 
fc-guarded tgd of form 

n : Gi(Xi,Z7i),Si(Xi,Lri,Fi) ^Si(Xi,Zi) 

and a (Gi, fc)-decomposition B( = U;<„7?i. Let r 2 £ 
Ref (E 2 ) such that lw(r 2 ) < fc and r 2 is of the form 

r 2 : G 2 (X 2 ,U 2 ),B' 2 (X' 2 ,U' 2 ,V 2 ) ->H 2 {X 2 ,Z 2 ). 

Finally, let r 3 be the tgd derived from n and r 2 , and recall 
that r 3 is of the form 

r 3 : G 3 {X 3l U 3 ),B' 3 {X' 3 ,$,V 3 ) -+H 3 (X 3 ,Z 3 ) 

where G 3 = GiUB', X 3 = X i; U 3 = U1UV1, and B' 3 = B' 2 . 
Since lw(r 2 ) < k and G 2 is non-empty, we have \B' 2 \ < 
(fc - 1). For all i < n, we have V S i n V s / C Xi C V Gl • 

Therefore, the set of instances {B\ };<„ U {B' 2 } is a (Gi, fc) 
decomposition of the body of r 3 and r 3 is fc-guarded. 

Given a set of tgds E, we define the flatening of E, denoted 
E°°, as the minimal set of tgds E' such that E C E' and E' 
contains all the tgds that can be derived from (E', E). 



Fact 3. For all set E of tgds and all set Q of instances, 
(E°°, Q) has the flat chase property. 

Proof Sketch. Consider an instance D and suppose that 
D A E |= Q. By completeness of the chase, we know that 
there exists a derivation 

7—1 j chase r chase chase r 

L> = IQ >T. ll >T. ■ ■ ■ in 

such that I n \= Q- If this derivation is flat, and since 
E C E°° , we have I n £ Flat(S, E°°) and I n \= Q. Suppose 
now that the derivation is not flat and consider the first 
integer j < n such that Ij is obtained from with a 
chase step which is not flat. We can the check that there 
exists a tgd r 2 : B 2 (X 2 , Y 2 ) -¥ H 2 {X 2 , Z 2 ) in Ref(E 2 ) such 
that B 2 C Ij = Ij-! U H 2 and X 2 % Vd- Consider 

a guard atom G(v) for r 2 . Since X C {v} and X % V(D), 
there exits some i < j such that G(v) € (/» \/<_i). Let G 2 
be the set of all the atoms of B 2 that belong to (Ij 
Write r 2 under the form 

r 2 : G 2 (X 2 , U 2 ),B' 2 (X 2 , U' 2 , V 2 ) -> H 2 (X 2 ,Z 2 ). 

and observe that G 2 is non-empty (since it contains G(u)). 
Considering Nj = Vj j \/ j _ 1 , since the derivation is flat 
from the step 1 to the step j, we can observe that the only 
atom of Jj_i containing a variable of Nj are the atoms of 
Ij\Ij-i- Therefore, (U 2 U V 2 ) is disjoint from Nj. By 
definition of careful refinements, we can now consider a 
tgd r\ e CRef(E) of the form 

n : Bi(Xi,Yi) ^ Bi(Xi,Zi) 

where Y\ is set of variables disjoint from (X 2 ,U 2 ,V 2 ), 
Bi[#Yi] C 7j_i for some renaming 6y 1 of Y\ and Ji = 
ij_i U Bi. Let r 3 = B 3 — > B 3 be the tgd such that 
B 3 — B\ U B 2 and H 3 = Bi U H 2 , and observe that 
r 3 € E°°. Let 6' be an injective renaming of Z\ U Z 2 
into a set of fresh variables (disjoint from Vi n ) and for all 
k < J, let J^. = 7fc[0']. We can observe that we have 

7-, r chase chase j chase jf chase chase jl 
L) = Iq 7-E ■ ■ ■ ?E Jj — 1 ?r 3 lj >T. ■ ■ ■ >T. 1„ 

where the step 7,_i c ^> r3 7j is now a flat step. 

We can finally generalize the above construction to show 
(by induction on n) that every chase derivation of D by 
E (leading to 7 n |= Q) can be transformed into a flat 
derivation of D by E°° (leading to I' n \= Q) 

Given a tgd r : B(X, Y)— >77(X, Z), we define a left-core 
projection of r as a minimal (for C) set of atoms B' C B 
such that B' = B[6y] for some renaming 6y of Y. Given a 
set of tgds E, we let LG(E) be the set of instances B' such 
that B' is a left-core projection of some tgd r £ E. 

Fact 4. Given a finite schema cro, an integer fco, and 
infinite set E of fco-guarded tgds over cro, the set LG(E) 
is finite up to isomorphism. 

Proof Sketch. On a fixed schema 00 , there are (up to iso- 
morphism) only a finite number of atoms that can be used 
as a guard G for a decomposition (B l )i< n . For a fixed in- 
teger fco and for every instance B = GU[J i B l correspond- 
ing to the left-core projection of some tgd, there is only a 
finite number of possible blocks B r satisfying \B r \ < k — 1 
that can be used in the composition, up to bijective re- 
naming of \ It follows that each B € 7G(E) is 
finite and that LG(E) is finite up to isomorphism. 
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Given a set of tgds E and an integer k, we that that a tgd 
r : B — > H is a fc-tgd of E, denoted E^ fe ' when there exists 
a tgd B' -> H' in the flatening E°° of E such that: (1) H 
is a subset of H' containing at most k atoms; and (2) B is 
a left-core projection of B' —¥ H. As a corollary of Fact 4, 
it can be observed that E' fe ^ is finite (up to isomorphism) 
whenever E is a finite set of guarded tgds. 

Fact 5. For all sets E of guarded tgds, all sets Q of 
instances and all integer k such that 

k > max{|B|, B e LC(E°°) u Q} 

(E^, Q) enjoys the flat chase property. 

Proof Sketch. The property can be shown by adapting the 
proof of Fact 3. Indeed, a flat derivation by E°° can be 
modified in two ways that preserves completeness: (1) one 
may replace the body of a tgd by a left-core of this tgd; 
and (2) one may select, for each tgd B—tH, the atoms 
from H which will be used later in the derivation (as to 
enforce than \H\ < k). 

We can finally use Fact 5 to conclude the proof of Lemma [5] 
and Theorem □ 

5. CONCLUSION AND FUTURE WORK 

This paper presented contributions along two axes: 

More General Classes. Considering a complete reso- 
lution procedure extends the possible fields of applications 
of Datalog*. Resolution can be used, first, to generalize 
existing tractability criteria (such as stickiness) but it can 
also be used, in practice, without any syntactic assumption. 
A technique of integration or simulation however proves of- 
ten useful (and sometimes necessary) to handle egds. Dat- 
alog rewriting finally covers the three main paradigms of 
Datalog*: stickiness, termination and guardedness. There 
are also alternative paradigms of tractability that were con- 
sidered in [4j which have not been discussed here (to sim- 
plify the discussion) but are yet to compare with the class 
of Datalog-rewritable dependencies. In particular, a natural 
question (left as future work) is the following: is Datalog 
rewriting always possible under bounded-tree width\l3\? 

More Efficient Algorithms. This paper introduced 
several algorithms and heuristics that can be of clear practi- 
cal use. In particular, the resolution can certainly (at least, 
in some contexts) be more efficient than the chase. The tech- 
nique of Datalog Rewriting can also prove useful in practice. 
In particular, one may consider Magic Set or any other stan- 
dard optimisation of Datalog to improve efficiency. Nonethe- 
less, there is still a long road ahead because the algorithms 
of Datalog Rewriting proposed in Section 4 are (for the mo- 
ment) very much non-optimal. An interesting and challeng- 
ing question that remains is the following: how to compute 
in practice a Datalog rewriting of reasonable size? 
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